We present PRM-RL, a hierarchical method for long-range navigation taskcompletion that combines sampling-based path planning with reinforcementlearning (RL) agents. The RL agents learn short-range, point-to-pointnavigation policies that capture robot dynamics and task constraints withoutknowledge of the large-scale topology, while the sampling-based plannersprovide an approximate map of the space of possible configurations of the robotfrom which collision-free trajectories feasible for the RL agents can beidentified. The same RL agents are used to control the robot under thedirection of the planning, enabling long-range navigation. We use theProbabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents areconstructed using feature-based and deep neural net policies in continuousstate and action spaces. We evaluate PRM-RL on two navigation tasks withnon-trivial robot dynamics: end-to-end differential drive indoor navigation inoffice environments, and aerial cargo delivery in urban environments with loaddisplacement constraints. These evaluations included both simulatedenvironments and on-robot tests. Our results show improvement in navigationtask completion over both RL agents on their own and traditional sampling-basedplanners. In the indoor navigation task, PRM-RL successfully completes up to215 meters long trajectories under noisy sensor conditions, and the aerialcargo delivery completes flights over 1000 meters without violating the taskconstraints in an environment 63 million times larger than used in training.
展开▼